|
Normalized compression distance is way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two pictures, two systems, two genomes, to name a few. Such a measurement should not be application dependent or arbitrary. A reasonable definition for the similarity between two objects is how difficult it is to transform them into each other. ==Information distance== We assume that the objects one talks about are finite strings of 0s and 1s. Thus we mean string similarity. Every computer file is of this form, that is, if an object is a file in a computer it is of this form. One can define the information distance between strings and as the length of the shortest program that computes from and vice versa. This shortest program is in a fixed programming language. For technical reasons one uses the theoretical notion of Turing machines. Moreover, to express the length of one uses the notion of Kolmogorov complexity. Then, it has been shown 〔 (C.H. Bennett, P. Gacs, M. Li, P.M.B. Vitányi, and W. Zurek, Information Distance, IEEE Trans. Inform. Theory, IT-44:4(1998) 1407–1423 )〕 : up to logarithmic additive terms which can be ignored. This information distance is shown to be a metric (it satisfies the metric inequalities up to a logarithmic additive term), is universal (it minorizes every computable distance as computed for example from features up to a constant additive term).〔 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「normalized compression distance」の詳細全文を読む スポンサード リンク
|